A Distance-based Separability Measure for Internal Cluster Validation

نویسندگان

چکیده

To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for in typical unsupervised learning, many internal validity indices (CVIs), which use predicted and data, have been created. Without labels, to design an effective CVI as difficult create method. And it crucial more CVIs because universal that can be used measure all datasets specific methods selecting proper clusters without labels. Therefore, apply variety necessary. In this paper, we propose novel – the Distance-based Separability Index (DSI), based on data separability measure. We compared DSI with eight including studies from early Dunn (1974) most recent CVDD (2019) external ground truth, by using five algorithms 12 real 97 synthetic datasets. Results show effective, unique, competitive other CVIs. also summarized general process created rank-difference metric comparison CVIs’ results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Internal Validity Measure for Cluster Validation

Data Clustering is a technique of finding similar characteristics among the data set which are always hidden in nature and grouping them into groups, called as clusters. Different clustering algorithms exhibit different results, since they are very sensitive to the characteristics of original data set especially noise and dimension. The quality of such clustering process determines the purity o...

متن کامل

A Separability Index for Distance-based Clustering and Classification Algorithms

We propose a separability index that quantifies the degree of difficulty in a hard clustering problem under assumptions of a multivariate Gaussian distribution for each cluster. A preliminary index is first defined and several of its properties are explored both theoretically and numerically. Adjustments are then made to this index so that the final refinement is also interpretable in terms of ...

متن کامل

Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance

The k-means method has been shown to be effective in producing good clustering results for many practical applications. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. The main disadvantage of the k-means algorithm is that the ...

متن کامل

A Compression Based Distance Measure for Texture

The analysis of texture is an important subroutine in application areas as diverse as biology, medicine, robotics and forensic science. While the last three decades have seen extensive research in algorithms to measure texture similarity, almost all existing methods require the careful setting of many parameters. There are many problems associated with a surfeit of parameters, the most obvious ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Artificial Intelligence Tools

سال: 2022

ISSN: ['1793-6349', '0218-2130']

DOI: https://doi.org/10.1142/s0218213022600053